Pattern iscovery i istributed at abases

نویسندگان

  • Raj Bhatnagar
  • Sriram Srinivasan
چکیده

Most algorithms for learning and pattern discovery in data assume that all the needed data is available on one computer at a single site. This assumption does not hold in situations where a number of independent databases reside on geographically distributed nodes of a computer network. These databases cannot be moved to a single site due to size, security, privacy and data-ownership concerns but all of them together constitute the dataset in which patterns must be discovered. Some pattern discovery algorithms can be adapted to such situations &nd some others become inefficient or inapplicable. In this paper we show how a decision-tree induction algorithm may be adapted for distributed data situations. We also discuss some general issues relating to the adaptability of other pattern discovery algorithms to distributed data situations

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Saving whilst Gambling: An Empirical Analysis of U.K. Premium Bonds

Working papers are in draft form. This w orking paper is d istributed for purposes of com m ent and d iscussion only. I t m ay not be reprod uced w ithout perm ission of the copyright hold er. Copies of w orking papers are available from the author.

متن کامل

G EOGRAPHICALLY D ISTRIBUTED C OMPUTING : ATM over the NASA ACTS Satellite

This paper outlines some of the problems and the solutionsdeveloped to support geographically distributed computing via ATM. In particular, applications developed with the Parallel Virtual Machine (PVM) [1] message passing library, communicating via ATM at OC3c speeds (155 Mbps) through the NASA ACTS satellite are considered. A primary goal of this work is to assess the suitability of an ATM-ba...

متن کامل

C Ollaborative D Efence for D Istributed a Ttacks ( C Ase S Tudy of P Alestinian I Nformation S Ystems )

In this paper, we develop a comprehensive approach for protecting national Palestinian information systems. We do not restrict our attention to protecting each individual organization, but rather focus on the entire ecosystem as a whole. Therefore, the developed system will be opened for participation for all Palestinian governmental and non-governmental organizations who are interested in impr...

متن کامل

Complexity of Protein–Protein Interaction Networks, Complexes, and Pathways

The focus of proteomic re s e a rch in developing ex p e rimental techniques for pro t e i n i d e n t i fi c ation and interaction studies is shifting from individual proteins to their organ i z ation in reaction pat h way s , c o m p l exe s , and netwo rk s , i . e. , to the pro t e o m e — t h e l a rge-scale netwo rk comprising all pro t e i n – p rotein interactions in a cell, t i s s u e...

متن کامل

Concurrency: A Case Study in Remote Tasking and D istributed I

Remote tusking encompasses different functionality, such as remote forking, multiple remote spawning, and task migration. In order to overcome the relatively high costs of these mechunisms, optimizations can be upplied at various levels of the underlying operating system or application. Optimizations include concurrent message transmission, increased throughput and reduced latency at the distri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999